Efficient Support Vector Learning for Large Datasets
نویسنده
چکیده
In recent years, we have witnessed significant increase in the amount of data in digital format, due to the widespread use of computers and advances in storage systems. As the volume of digital information increases, there arises the need for more effective tools to better find, filter and manage these resources. Therefore, developing fast and highly accurate algorithms to automatically classify digital data has become an important part of the machine learning and knowledge discovery research. The goal of this thesis is to introduce a fast online Support Vector Machine (SVM) classifier algorithm that preserves the highly competitive classification accuracy rates of the stateof-the-art SVM solvers while requiring less computational resources. The outstanding speed improvement and the demand for less memory with the online learning setting enable the SVMs to be applicable to very large data sets.
منابع مشابه
Mammalian Eye Gene Expression Using Support Vector Regression to Evaluate a Strategy for Detecting Human Eye Disease
Background and purpose: Machine learning is a class of modern and strong tools that can solve many important problems that nowadays humans may be faced with. Support vector regression (SVR) is a way to build a regression model which is an incredible member of the machine learning family. SVR has been proven to be an effective tool in real-value function estimation. As a supervised-learning appr...
متن کاملScalable Twin Neural Networks for Classification of Unbalanced Data
Twin Support Vector Machines (TWSVMs) have emerged an efficient alternative to Support Vector Machines (SVM) for learning from imbalanced datasets. The TWSVM learns two non-parallel classifying hyperplanes by solving a couple of smaller sized problems. However, it is unsuitable for large datasets, as it involves matrix operations. In this paper, we discuss a Twin Neural Network (Twin NN) archit...
متن کاملFast SFFS-Based Algorithm for Feature Selection in Biomedical Datasets
Biomedical datasets usually include a large number of features relative to the number of samples. However, some data dimensions may be less relevant or even irrelevant to the output class. Selection of an optimal subset of features is critical, not only to reduce the processing cost but also to improve the classification results. To this end, this paper presents a hybrid method of filter and wr...
متن کاملA New Play-off Approach in League Championship Algorithm for Solving Large-Scale Support Vector Machine Problems
There are many numerous methods for solving large-scale problems in which some of them are very flexible and efficient in both linear and non-linear cases. League championship algorithm is such algorithm which may be used in the mentioned problems. In the current paper, a new play-off approach will be adapted on league championship algorithm for solving large-scale problems. The proposed algori...
متن کاملA Hybrid Approach for an Efficient Classification Using Decision Tree and Svm
Nowadays real world data bases observed significant growth in the volume of data in digital format, due to the extensive use of datasets and storage system. It is essential for developing fast and accurate algorithms to automatically classify large data. However the data size increases the proposed method make faster computation and scalable machine learning algorithm is used to learn faster fr...
متن کاملCombine Vector Quantization and Support Vector Machine for Imbalanced Datasets
In cases of extremely imbalanced dataset with high dimensions, standard machine learning techniques tend to be overwhelmed by the large classes. This paper rebalances skewed datasets by compressing the majority class. This approach combines Vector Quantization and Support Vector Machine and constructs a new approach, VQ-SVM, to rebalance datasets without significant information loss. Some issue...
متن کامل